20 research outputs found

    Data-Driven Methods for Managing Anomalies in Energy Time Series

    Get PDF
    With the progressing implementation of the smart grid, more and more smart meters record power or energy consumption and generation as time series. The increasing availability of these recorded energy time series enables the goal of the automated operation of smart grid applications such as load analysis, load forecasting, and load management. However, to perform well, these applications usually require clean data that describes the typical behavior of the underlying system well. Unfortunately, recorded energy time series are usually not clean but contain anomalies, i.e., patterns that deviate from what is considered normal. Since anomalies thus potentially contain data points or patterns that represent false or misleading information, they can be problematic for any analysis of this data performed by smart grid applications. Therefore, the present thesis proposes data-driven methods for managing anomalies in energy time series. It introduces an anomaly management whose characteristics correspond to steps in a sequential pipeline, namely anomaly detection, anomaly compensation, and a subsequent application. Using forecasting as an exemplary subsequent application and real-world data with inserted synthetic and labeled anomalies, this thesis answers four research questions along that pipeline for managing anomalies in energy time series. Based on the answers to these four research questions, the anomaly management presented in this thesis exhibits four characteristics. First, the presented anomaly management is guided by well-defined anomalies derived from real-world energy time series. These anomalies serve as a basis for generating synthetic anomalies in energy time series to promote the development of powerful anomaly detection methods. Second, the presented anomaly management applies an anomaly detection approach to energy time series that is capable of providing a high anomaly detection performance. Third, the presented anomaly management also compensates detected anomalies in energy time series realistically by considering the characteristics of the respective data. Fourth, the proposed anomaly management applies and evaluates general anomaly management strategies in view of the subsequent forecasting that uses this data. The comparison shows that managing anomalies well is essential, as the compensation strategy, which detects and compensates anomalies in the input data before applying a forecasting method, is the most beneficial strategy when the input data contains anomalies

    RO-Crate Time Series Exporter for the Building Consumption Data of KIT Campus North

    Get PDF
    The facility management (FM) of the Karlsruhe Institute of Technology (KIT) operates an infrastructure for measuring energy consumption to invoice other organizational units within KIT for the energy consumed. For this purpose, the measuring infrastructure automatically records and stores the energy consumption of all buildings on Campus North at a resolution of 15 minutes. The recorded and stored consumption comprises different energy types, namely electricity, gas, heat, water (warm, cooling, drinking, and several kinds of wastewater), and compressed air. Since this measurement infrastructure is already in operation since 2006, the consumption data stored as time series meanwhile cover a long period of time. The covered period of time makes these energy consumption time series highly interesting for the energy research community, especially for energy researchers at KIT. However, accessing the data is challenging. While the original infrastructure was designed for single-user access and limited data throughput, it now faces multiple users and high data throughput. Moreover, since the used technology does not scale with the ever-growing data volumes, FM finally updated the data infrastructure. However, despite improvements with regard to performance, the new data infrastructure brings new challenges, including data only partially moved to the new infrastructure. For this reason, retrieving time series whose time range spans data from both old and new infrastructure requires a researcher to write queries for both database systems, which in turn requires knowing the complicated logic of the used schemas of both databases. Even if a researcher successfully queried such a time series, she needs further queries to allow measurement units, measurement quantities, and scaling factors to be included in the interpretation of the data. Given both this challenging data access and the increasing interest in the data, we started to simplify the process of data querying by developing a web service with a simple REST (Fielding, 2000) interface. This interface allows researchers to query data in a unified way, without requiring any knowledge about the underlying databases and thereby lowering the hurdles of accessing the data. The interface requires only a time range, a list of buildings, and energy types as inputs and returns a ZIP file including the time series as CSV files and an RO-Crate (Soiland-Reyes et al., 2022) metadata file in JSON. The metadata file fully describes the requested energy consumption time series by using the RO-Crate data package standard with an extended, in-house developed profile for time series description. This RO-Crate metadata file enables an interpretation of the obtained data without any prior knowledge and reduces the burden on researchers to publish the data according to good scientific practice. Since a lot of research using energy consumption data benefits from including exogenous influences such as weather (Dannecker, 2015 & Haben et al., 2023), the developed web service also allows obtaining weather time series for the specified time range, which again is described in the RO-Crate metadata file. The present poster shows the steps taken to develop the web service: It starts with the analysis of the original database schemas, before it describes the agreement on the required information resulting in a shared database schema. The poster continues with the transformation of the original data into the shared schema that builds the data foundation of the service. Next, the poster presents the creation of the time series profile, the standards and vocabularies, the used technologies to develop the service, and the challenges during the development of the software. The poster concludes with an outlook on planned improvements and extensions of the developed web service

    Using weather data in energy time series forecasting: the benefit of input data transformations

    Get PDF
    Renewable energy systems depend on the weather, and weather information, thus, plays a crucial role in forecasting time series within such renewable energy systems. However, while weather data are commonly used to improve forecast accuracy, it still has to be determined in which input shape this weather data benefits the forecasting models the most. In the present paper, we investigate how transformations for weather data inputs, i. e., station-based and grid-based weather data, influence the accuracy of energy time series forecasts. The selected weather data transformations are based on statistical features, dimensionality reduction, clustering, autoencoders, and interpolation. We evaluate the performance of these weather data transformations when forecasting three energy time series: electrical demand, solar power, and wind power. Additionally, we compare the best-performing weather data transformations for station-based and grid-based weather data. We show that transforming station-based or grid-based weather data improves the forecast accuracy compared to using the raw weather data between 3.7 and 5.2%, depending on the target energy time series, where statistical and dimensionality reduction data transformations are among the best

    ALDI++: Automatic and parameter-less discord and outlier detection for building energy load profiles

    Full text link
    Data-driven building energy prediction is an integral part of the process for measurement and verification, building benchmarking, and building-to-grid interaction. The ASHRAE Great Energy Predictor III (GEPIII) machine learning competition used an extensive meter data set to crowdsource the most accurate machine learning workflow for whole building energy prediction. A significant component of the winning solutions was the pre-processing phase to remove anomalous training data. Contemporary pre-processing methods focus on filtering statistical threshold values or deep learning methods requiring training data and multiple hyper-parameters. A recent method named ALDI (Automated Load profile Discord Identification) managed to identify these discords using matrix profile, but the technique still requires user-defined parameters. We develop ALDI++, a method based on the previous work that bypasses user-defined parameters and takes advantage of discord similarity. We evaluate ALDI++ against a statistical threshold, variational auto-encoder, and the original ALDI as baselines in classifying discords and energy forecasting scenarios. Our results demonstrate that while the classification performance improvement over the original method is marginal, ALDI++ helps achieve the best forecasting error improving 6% over the winning's team approach with six times less computation time.Comment: 10 pages, 5 figures, 3 table

    Data-Driven Copy-Paste Imputation for Energy Time Series

    Get PDF
    A cornerstone of the worldwide transition to smart grids are smart meters. Smart meters typically collect and provide energy time series that are vital for various applications, such as grid simulations, fault-detection, load forecasting, load analysis, and load management. Unfortunately, these time series are often characterized by missing values that must be handled before the data can be used. A common approach to handle missing values in time series is imputation. However, existing imputation methods are designed for power time series and do not take into account the total energy of gaps, resulting in jumps or constant shifts when imputing energy time series. In order to overcome these issues, the present paper introduces the new Copy-Paste Imputation (CPI) method for energy time series. The CPI method copies data blocks with similar properties and pastes them into gaps of the time series while preserving the total energy of each gap. The new method is evaluated on a real-world dataset that contains six shares of artificially inserted missing values between 1 and 30%. It outperforms by far the three benchmark imputation methods selected for comparison. The comparison furthermore shows that the CPI method uses matching patterns and preserves the total energy of each gap while requiring only a moderate run-time.Comment: 8 pages, 7 figures, submitted to IEEE Transactions on Smart Grid, the first two authors equally contributed to this wor

    Data-Driven Copy-Paste Imputation for Energy Time Series

    Get PDF
    A cornerstone of the worldwide transition to smart grids are smart meters. Smart meters typically collect and provide energy time series that are vital for various applications, such as grid simulations, fault-detection, load forecasting, load analysis, and load management. Unfortunately, these time series are often characterized by missing values that must be handled before the data can be used. A common approach to handle missing values in time series is imputation. However, existing imputation methods are designed for power time series and do not take into account the total energy of gaps, resulting in jumps or constant shifts when imputing energy time series. In order to overcome these issues, the present paper introduces the new Copy-Paste Imputation (CPI) method for energy time series. The CPI method copies data blocks with similar characteristics and pastes them into gaps of the time series while preserving the total energy of each gap. The new method is evaluated on a real-world dataset that contains six shares of artificially inserted missing values between 1 and 30%. It outperforms the three benchmark imputation methods selected for comparison. The comparison furthermore shows that the CPI method uses matching patterns and preserves the total energy of each gap while requiring only a moderate run-time

    Review of automated time series forecasting pipelines

    Get PDF
    Time series forecasting is fundamental for various use cases in different domains such as energy systems and economics. Creating a forecasting model for a specific use case requires an iterative and complex design process. The typical design process includes the five sections (1) data pre-processing, (2) feature engineering, (3) hyperparameter optimization, (4) forecasting method selection, and (5) forecast ensembling, which are commonly organized in a pipeline structure. One promising approach to handle the ever-growing demand for time series forecasts is automating this design process. The present paper, thus, analyzes the existing literature on automated time series forecasting pipelines to investigate how to automate the design process of forecasting models. Thereby, we consider both Automated Machine Learning (AutoML) and automated statistical forecasting methods in a single forecasting pipeline. For this purpose, we firstly present and compare the proposed automation methods for each pipeline section. Secondly, we analyze the automation methods regarding their interaction, combination, and coverage of the five pipeline sections. For both, we discuss the literature, identify problems, give recommendations, and suggest future research. This review reveals that the majority of papers only cover two or three of the five pipeline sections. We conclude that future research has to holistically consider the automation of the forecasting pipeline to enable the large-scale application of time series forecasting
    corecore